Computer simulations of language change notes

This website collects my personal notes on Computer simulations of language change. These notes are provided to bring full transparency to my research process. Of course, since they are only notes, they do not reflect my final thoughts on a topic, and should not be interpreted as such. To read finished papers, please consult my website. Do not use these notes as a basis for your own scientific research. Start from high-quality, peer-reviewed scientific literature instead.

Language and Complex Systems

Abstract

An understanding of language as a complex system helps us to think differently about linguistics, and helps us to address the impact of linguistic interaction. This book demonstrates how the science of complex systems changes every area of linguistics: how to make a grammar, how to think about the history of language, how language works in the brain, and how it works in social settings. Kretzschmar argues that to construct the best grammars of languages, it is necessary to understand the complex system of speech. Each chapter makes specific recommendations for how linguists should manage empirical data in order to form better generalizations about a language and its varieties. The book will be welcomed by students and scholars working in linguistics and English language, especially the study of language variation and the historical development of English.

p. 1

Introduction

the ‘cause’ of language

previously: Universal Grammar seen as a cause

↕

recent views

some effects arise without particular causes, as the result of random interactions of large numbers of elements in complex systems
⇒ effects emerge

p. 2

“order for free” (Kaufman 1995: 83)

we achieve regularity in behaviour without any simple cause
regularities do not occur at the cost of particular causes

↓

speech / language in use

order in language emerges from the linguistic interactions of speakers

p. 3

The most basic assumption of generative and structural linguistics, that we speakers all share the system of a language, share the rules for a language, is simply wrong. We all participate in speech, but the language is a little different, both in its available features and in the frequency with which we use those features, for each one of us individually and for each of us as a participant in every group to which we belong.

generalisations

made after the fact to help us organise our perceptions

p. 5

Language and complex systems

History of language and complex systems

Definition of a complex system (Mitchell 2009: 13)

a complex system is “a system in which large networks of components with no central control and simple rules of operation give rise to complex collective behavior, sophisticated information processing, and adaptation via learning or evolution.”

“complex”?

not ‘complicated’
rather: ‘numerous’

Complexity science at Santa Fe Institute (SFI)

doing complexity science required a commitment to observation, experience, and experiment that is in balance with the transdisciplinary, out-of-the-box curiosity that gave rise to the original question. It’s not that outcomes in complex adaptive systems are repeatable as they are in many scientific disciplines; complex systems are by definition unpredictable, and often downright squirrelly. But finding the patterns embedded in complex systems requires a distinct brand of scientific rigor and methodological approaches that in many cases haven’t yet been invented.

p. 6

‘adversary’ of complexity science

cause-and-effect reductionism

way of working

still: empirical observation and rigorous methods
but: we do not expect that simple causes can be found for the effects we observe
- behaviour is embedded in a large network without a central control

While complexity science was taking off at SFI, it did receive some early allusive discussion in linguistics: Lindblom, MacNeilage, and Studdert-Kennedy published a 1984 paper on self-organizing processes in phonology; Paul Hopper presented his seminal paper called “Emergent Grammar” in Berkeley in 1987 (see Chapters 3 and 4); Ronald Langacker published a chapter titled “A Usage-Based Model” for cognitive linguistics in 1988. Gradually more papers attempting to use complex systems in linguistics appeared in the 1990s, such as Van Geert (1991). In 1996 Edgar Schneider presented a paper whose title was a question, “Chaos Theory as a Model for Dialect Variability and Change?” (published 1997). At that time, it had already been over twenty years since the original paper on climate by Edward Lorenz that asked the question, “Does the Flap of a Butterfly’s Wings in Brazil Set off a Tornado in Texas?” (1972), and over ten years since the founding of the SFI, where chaos theory was studied as part of the emerging field of complexity science. But it was very early for a student of language to consider the subject as a serious model for speech. In the same year, J. K. Chambers commented in a book review that “We will need a coterie of sociolinguists expert in chaos theory before we can make a start [at applications to our field]” (1996: 163). Chambers noted that the biggest problem for language applications then was that chaos theory seemed to require a long series of observations over time, a rare commodity for those who systematically record language in use.

Examples of complex systems

ants

not centrally controlled, yet seem highly organised
more in Mitchell 2009: 176-187

p. 8-9

features of an ant colony

random behaviour: choice between foraging, nest building and defence is not deterministic!
necessary: if focussed on one sort of behaviour, might be vulnerable
but: communication among ants might influence behaviour needed at that moment

p. 9

Game of Life

p. 10

RIP John Conway 😢

Game of Life

simple rules determine whether a cell is dead or alive

initial state

very important
depending on the initial constellation, behaviour might die out or be perpetual
≈ sensitive dependence on initial conditions

cellular automaton

a complex system based on simple cells

p. 11

Principles of complex systems

Properties of complex systems

Basic principles of complex systems

p. 19

1. continuing dynamic activity in the system

so: no static structure

2. random interaction of large numbers of components

components don’t stand still in an hierarchical arrangement of types

3. exchange of information with feedback

probabilistic feedback keeps the system from becoming stuck in rule-bound relations

4. reinforcement of behaviors

reinforcement of behaviours from feedback creates non-linear distributions of units, as opposed to random or statistically normal distribution
- at every level of scale instead of at just one level characterised by homogeneous unity

5. emergence of stable patterns without central control

p. 11

More chaos echoes

The mathematician Benoit Mandelbrot claimed that “many patterns of Nature are so irregular and fragmented, that, compared to [standard geometry] Nature exhibits not simply a higher degree but an altogether different level of complexity” (1982: 1). His treatment of natural forms like the geometry of coastlines presented problems that could not be solved with traditional methods but required a new nonlinear mathematics, what he called “fractals.” Fractals are familiar to many of us through repeating graphic designs like the “Koch islands” in Figure 1.3. The basic properties of fractals – including scaling properties, as illustrated here – characterize many objects of study in the physical, natural, and social sciences, not just graphic designs

Koch Island (p. 12)

p. 12

Types of systems

equilibrium system	non-equilibrium system
closed	open
do not exchange matter or energy outside the system	exchange energy of matter outside the system
components are balanced	components are continually negociating

Equilibrium systems

Low-energy equilibrium system

Kauffman offers the example of dropping a ball down the side of a bowl: the ball will roll up and down the sides but will eventually come to rest at the bottom of the bowl, at low-energy equilibrium.

High-energy equilibrium system

In the case of an energetic equilibrium system, again in Kauffman’s example, if we put a quantity of gas molecules into a tank, the molecules do not stay ordered in a group at the point of entry; they keep moving around in the tank, and according to the ergodic theory they move randomly through all of the statistically possible states of arrangement.

p. 12-13

Non-equilibrum system

Kauffman sets the counter example of the small whirlpool that forms near drains: this ordered structure will be maintained as long as the drain remains open and water continues to flow The order in such a non-equilibrium system is sustained by persistent dissipation of matter and energy, and thus the whirlpool can be called a “dissipative structure” of the kind described by Prigogine. No stirring is required to start the whirlpool, no single and simple cause. The water is subject to natural laws like gravity that makes the water drain, but no collection of laws completely explains the whirlpool because randomness in the molecules and conditions is involved in the emergence of every particular whirlpool.

p. 15

scale-free network

a network “in which the distribution of links to nodes follows a power law”
the vast majority of nodes have very few connections, while a few important nodes (we call them Hubs) have a huge number of connections

p. 17

chaotic systems

deterministic → small changes in initial conditions lead to significantly different future behaviour
- complex systems do require sufficient conditions for their operations, such as enough live cells in the Game of Life to allow the rules to operate in a complex way, but they are not deterministic
also: intermittency / cyclic behaviour

We still get a whirlpool or bubble patterns at different water levels, or if someone should wade in the water and disturb the flow.

p. 17-19

Descriptions of self-organization typically use data collected in time series, and apply complex mathematical operations to generate “attractors,” as shown by Guastello and Liebovitch for psychology (Figure 1.9). As for the application of successive values to make Mandelbrot’s San Marco Dragon with a formula, successive measurements of real phenomena over time may create patterns when graphed that tend towards a fixed point (A), or create an oscillation or orbital shape (C, E), or other patterns whose regularity may be more difficult to see (chaotic, or “strange” attractors). Each successive moment in time corresponds to a “state” of the phenomenon being observed, whether it is traffic in a city or economic activity in a country or evolutionary development in a biological system.

p. 19

Chaotic systems ⟷ complex systems

complex systems	chaotic systems
settle into a very small number of states	occupy a very large number of states
undisturbed by small changes

attractor

“another term for regularity or ordered behavior achieved by the elements being observed”

Language as a complex system

The multi-dimensionality of language

time series in language studies

difficult!
1. gathering data is expensive
2. our data points are rich

p. 20

We do look back in time in historical linguistics to observe change, but we do not have enough information about how the members of a population were actually speaking at any given time to make any more than speculative judgments about the state of the language in the remote past.

The situation is little better for the recent past, when we may have more information from writing or even from recorded speech, but still no fair way to estimate how all of the members of a population were actually speaking (see Chapter 7).₄ This means that we need to look for the effects of complex systems in speech in the surveys and other collections of data that we can actually carry out.

₄ The problem of rich data also introduces greater dimensionality for the description of systems. Stephen Wolfram (2002) required over 1,000 pages to prepare a comprehensive description of the patterns created by the successive states of a one-dimensional cellular automaton (a set of eight boxes in a row, which can be either black or white), according to the application of different rules for how the on/off pattern would change between states. Kauffman’s light bulbs were arranged in a two-dimensional grid. Speakers as agents interact with each other in many different ways, and speakers can choose between many possibilities for any linguistic feature we wish to observe.

(my emphasis, also for the italics)

↳ at any moment, language is highly multi-dimensional, but also stable

↕

‘chaotic’ view of language

long state cycles (always changing, unstable yet cyclic)

Language as a complex system proper

complex systems

can be used to describe language varieties

p. 21

1. continuing dynamic activity in the system

new conversations and writings occur continuously
⇒ language needs to remain in use in order to stay alive

p. 22

2. random interaction of large numbers of components

self-organisation in the form of geographical, social and textual clustering of speech sounds and words
associated in different ways with particular localities, groups or text types

3. exchange of information with feedback

probabilistic feedback keeps the system from becoming stuck in rule-bound relations

4. reinforcement of behaviors

p. 24

Zipf’s Law

a frequency ranking of words in texts
rank is roughly inversely related to frequency

↓

quantity	frequency
“few”	very frequent words
“some”	moderately frequent words
“most”	low frequency words

p. 28

linear scales on both axes	logarithmic scales on both axes

also called an A-curve

p. 24

place of occurrence

phonological variants of a single word
lexical variants of a concept
also appears in subsamples

p. 31

We now have an answer for Edgar Schneider’s question about chaos theory and speech. No, chaos theory is not a model for dialect variability and change. But speech does constitute a complex system, “at the edge of chaos.”

p. 32

Using the A-curve for generalisation

We can use the A curve to define the relationship between what people actually say or write and the generalisations that we want to make from that behaviour

↓

most common variants on the A-curve

perceived as “normal” or “expected”
⇒ for any group (or for no group), we can deduce a ‘linguistic system’ (p. 33)

p. 33

↓

observational artifact

an artifact (i.e. linguistic system) built in our perception
linguistics systems do not “exist” in reality per se, but rather stem from our interpretation of reality
⇒ so: we can create many subcategorisations of language, but by definition these are subjective, unstable and conventional

p. 32

long tail variants on the A-curve

perceived as “different”

↳ distribution itself

gives users of speech consistent, high-quality input for perceptions

So, for example, a diphthongal pronunciation of fog is what we expect from women, but not what we expect from the LAMSAS speakers overall. We can expect ‘weeks without rain’ to be called a dry spell as a normal word, but accept that many other words are possible, and understandable, as different variants. We can perceive the top-ranked variants of any linguistic feature for groups at any level of scale, and the fact that different variants for a given feature will be ranked more highly in different groups helps us to distinguish the language behavior of the group.

↳ ‘scale-free’ for language

just as fractals are scale-free geometrical figures in mathematics, the A-curve power law is scale independent

p. 33

Social groups, personal choice and feedback

choice of form

speakers do not always choose the most popular form (else, there wouldn’t be an A-curve)
rather, they can choose whichever form fits the situation and the group

↓

p. 33

feedback and reinforcement

different situations call for different forms
people belong to different groups, and can position themselves for different variants

p. 34

Consequences of the A-curve for linguistic theory

linguistic “system”

does not exist → any variety we name actually exists as an observational artifact that comes from our perceptions of the available variants

↳ frequency effects

create a complex system which displays itself as an A-curve

p. 34-35

Linguistic systems as low-energy equilibrium systems are always observational artifacts of our perception, in effect transformations of speech data from its natural existence as part of a complex system.

Analysing language in the traditional way?

When we propose the existence of such hierarchical linguistic systems based ([e.g. tree-based grammar]) upon our perception of speech around us, as we certainly want to do and are justified to do by the distributional patterns of speech as a complex system, we need to be guided by the nonlinear distributional pattern of the evidence of language in use because no system that we describe is actually instantiated in the spoken interactions themselves.

(own interpolation)

p. 36

Linguistics, science, the humanities, and complex systems

skipped

p. 56

Usage-based linguistics and complex systems

p. 57

Paul Hopper (1987)

We will see that a complete single grammar for a language could never be motivated by detailed observation of speech production, because the patterned distribution of features and their variants that always emerges from the complex system of speech cannot be captured by the binary logic of formal linguistic systems.

(my own emphasis)

p. 58

What is grammar?

[W]hile grammar never exists as such in language in use, it can well exist as a description of regularities indirectly derived from speech performance by perceptual means. This is, in fact, just what all linguists do

p. 59

long tail of the A-curve

where older meanings/realisations are retained
but: also where novel features can be found as they enter the language

Paul Hopper and Elizabeth Traugott (1993)

Elizabeth Traugott slander

p. 61

Joan Bybee (2001)

Bybee (2001) focus

remains on abstract structure
however: usage-based idea is ‘incorporated’

↳ phonology

presented as a ‘structure’ on which frequency (as the result of usage) has an ‘impact’

networked units

Bybee presents units of the structure (words, sounds) to be stored in networks of related units

In network models, internal structure is emergent – it is based on the network of connections built up among stored units. The stored units are pronounceable linguistic forms – words or phrases stored as clusters of surface variants organized into clusters of related words . . . Units such as syllables and segments emerge from the inherent nature of the organization of gestures for articulation.

(Bybee 2001: 85)

gradient storage

each unit does not have just one form, but is rather a gradient

The view that phonological representations are self-organizing means that units of analysis, such as segments and syllables, are emergent units and are permitted to have gradient properties. This view does not insist upon one unit of uniform size for describing all speech, but rather proposes that the organization of linguistic material into units depends entirely upon the substantive properties of that material.

(Bybee 2001: 86)

p. 62

⚠ ↓ problem!

Bybee’s a priori assumption of linguistic structure

"Bybee has prejudged the outcome of emergence

As we have seen in Chapter 1, every individual is at the nexus of many groups of people, regional and social, and each one of those groups will have its own distributional pattern of many variants for any feature we name.

There is no necessity to privilege the units that linguists normally talk about, like phonemes, which derive essentially from the simplifying generalization of a formal model.

p. 64

In her 2001 book Bybee tried to marry simple ideas of self-organization and emergence from complex systems with the assumption of structure, but she was doomed to fail, if influentially, by her superficial use of complexity and by her failure to break away from the assumption of structure.

Anthe summary

To recapitulate, the issue here is that Bybee assumes there is such a thing as ‘structure’, while structure doesn’t exist in reality according to Kretzschmar, and is actually derived from our perception of reality. At the same time, Bybee ignores the fact that an individual doesn’t store ‘a grammar’ in one’s mind, but rather a collection of different forms and structures in an A-curve-like manner, which can be employed depending on the situation (which outs itself, for example, as social variation).

Janet Pierrehumbert (2001)

examplar theory

focus shifted back to individuals, memory and cognitive processing of speech forms

p. 65

Lots of important things going on here

The exact phonetic details of a word’s pronunciation arise because

the word is retrieved from the lexicon, and
processed by the rules of constraints of the grammar
whose result (the surface phonological form of the word) is fed to a phonetic implementation component
[which] computes the articulatory and/or acoustic goals
which actualize the word as speech . . .

A second challenge arises from the fact that differential phonetic outcomes relate specifically to word frequency. Standard generative models do not encode word frequency.

(Pierrehumbert 2001: 138)

in the exemplar model, “each category is represented in memory by a large cloud of remembered tokens of that category” (Pierrehumbert 2001: 139)

categorisation

you hear a new token and compare it to the distribution of remembered tokens
⇒ goal: make the best decision on what category the new token belongs to

↳ influencing factors

what the ear can distinguish
how recently each exemplar was encountered

Schematic representation of categorisation structure

TODO this reminds me of the language acquisition paper from Speech Science. let’s look that up again later

The new token is symbolized by the asterisk at the bottom of the figure, and the task is to decide whether the token belongs to the category for the /ɛ/ phoneme or for the /ɪ/ phoneme along the one dimension presented here, F2. In this ambiguous case, the new token is closer to the main distribution of /ɪ/ remembered tokens, even though the actual distribution overlaps, and so it would be assigned to the /ɪ/ category.

p. 66

multidimensionality

any exemplar can be a member of more than one classification scheme at the same time
⇒ one exemplar might be included in classifications for a phoneme, a word, female speech, and “Mom” all at the same time

The exemplar approach associates with each category of a system a cloud of detailed perceptual memories. The memories are granularized as a function of the acuity of the perceptual system (and possibly as a function of additional factors). Frequency is not overtly encoded in the model. Instead, it is intrinsic to the cognitive representations for the categories. More frequent categories have more exemplars and more highly activated exemplars than less frequent categories (Pierrehumbert 2001 : 142).

Benefits of the exemplar model

improved description of detailed phonetic knowledge that speakers have about their language
the mechanism for decisions about categories
the accommodation of frequency to some degree

Problems with Pierrehumbert model

1. it is too simple

actual categorisation takes place among many more categories (several dozens!)

2. the pattern is wrong

(the frequencies in 👁 ↑ are ‘not what we would expect from a complex system’

p. 67

3. statistical learning in language acquisition

many studies in language acquisition show the importance of distributional learning and transitional probabilities
as such, the simple Pierrehumbert model of 👁 ↑ is hard to reconcile with this fundamental property of language learning and internal representation

p. 68

↳ role of frequency

still important, but more so in the role it plays within transitional and distributional learning

4. phoneme system

such a simplistic label system does not necessarily exist

5. prototype model

there is a central tendency among the exemplars that could be computed as an average
there is a a Gaussian distribution
⇒ ignores the A-curve

While Pierrehumbert’s exemplar model makes an improvement in the management of frequency information over Bybee’s earlier high-level generalizations, it does not yet get all the way to complex systems because it addresses neither the nonlinear distribution of realizations nor their scale-free distribution.

Adele Goldberg (2006)

key point

“all levels of grammatical analysis involve constructions: learned pairings of form with semantic or discourse function, including morphemes or words, idioms, partially lexically filled and fully general phrasal patterns” (Goldberg 2006: 5)

p. 69

usage-based idea

very well represented here!

One particular verb accounts for the lion’s share of tokens of each argument frame considered in an extensive corpus study . . . The dominance of a single verb in the construction facilitates the association of the meaning of the verb in the construction with the construction itself, allowing learners to get a “fix” on the construction’s meaning. . . . In this way, grammatical constructions may arise developmentally as generalizations over lexical items in particular patterns.

↳ traditional linguistic practice

still lingers in “grammatical constructions” as the outcome of “making generalisations”
for the same reason as Bybee in 👁 ↑

The good

1. generalisation

distributional effect that arises from a particular group of speakers is captured
connected to individual cognitive development

2. association of constructions with cognitive processing

connects construction grammar with exemplar theory (e.g. Bod 1998; Skousen 1989)

3. free manner of categorisation

parallels exemplar theory
grammatical forms can now also be ‘categorised’ (in addition to just speech sounds)
avoids the reification of traditional categories

p. 70

The bad

it just doesn’t go far enough

Michael Tomasello (2003)

view on grammar

an inventory of constructions

“A plausible way to think of mature linguistic competence, then, is as a structured inventory of constructions, some of which are similar to many others and so reside in a more core-like center, and others of which connect to very few other constructions (and in different ways) and so reside more towards the periphery.”

(2003: 5–6)

recurrence

an important process for the building of ‘language’
but: complex systems are never mentioned

(some language acquisition data to prove a point – nothing special)

p. 73

Nick Ellis and Diane Larsen-Freeman (2009)

A usage-based theory of grammar in which the cognitive organization of language is based directly on experience with language. Rather than being an abstract set of rules or structures that are only indirectly related to experience with language, we see grammar as a network built up from the categorized instances of language use . . . The basic units of grammar are constructions, which are direct form-meaning pairings that range from the very specific (words or idioms) to the more general (passive construction, ditransitive construction), and from very small units (words with affixes, walked) to clause-level or even discourse-level units . . . Because grammar is based on usage, it contains many details of co-occurrence as well as a record of the probabilities of occurrence and cooccurrence. The evidence for the impact of usage on cognitive organization includes the fact that language users are aware of specific instances of constructions that are conventionalized and the multiple ways in which frequency of use has an impact on structure.

usage-based linguistics ideas

constructions as the basic unit
"record of probabilities " → ≈ exemplar theory
network built from categorised instances of use

their definition of grammaticalisation

the universal process that describes the operation of the complex adaptive system of speech

p. 74

Language as a [complex adaptive system] of dynamic usage and its experience involves the following key features:

The system consists of multiple agents (the speakers in the speech community) interacting with one another.
The system is adaptive; that is, speakers’ behavior is based on their past interactions, and current and past interactions together feed forward into future behavior.
A speaker’s behavior is the consequence of competing factors ranging from perceptual mechanics to social motivations.
The structures of language emerge from interrelated patterns of experience, social interaction, and cognitive processes.

(Ellis and Larsen-Freeman 2009a: 2)

↳ structures of language

still, prejudgement of the outcome of emergence
⇒ grammar shown as a state instead of a network/processing/ongoing thing

language change

propelled by social changes
cause the selection mechanism to skew towards a new focal point

↳ problem

“freezing” → not enough attention given to the dynamic aspect of complex systems

preference for, or “fixation” of, a grammatical state takes the dynamic movement of the complex system and freezes it, so that one variant of a feature becomes “grammatical” in the sense of having been selected

p. 76

[F]requency distributions occur for any constructions we decide to nominate, but linguists are the ones who create the categories, who make the grammar. The operation of complex systems does not create a network out of which we observe a state as an object. The complex system merely creates a nonlinear frequency distribution, and linguists think that they see a grammar in the distribution after the fact.

Joan Bybee (2010)

Language as a complex adaptive system

When linguistic structure is viewed as emergent from the repeated application of underlying processes, rather than given a priori or by design, then language can be seen as a complex adaptive system . . . The primary reason for viewing language as a complex adaptive system, that is, as being more like sand dunes than like a planned structure, such as a building, is that language exhibits a great deal of variation and gradience.

Gradience refers to the fact that many categories of language or grammar are difficult to distinguish, usually because change occurs over time in a gradual way, moving an element along a continuum from one category to another . . .
Variation refers to the fact that the units and structures of language exhibit variation in synchronic use, usually along the continuous paths of change that create gradience.

(Bybee 2010 : 2)

p. 77

↳ gradience

adaptation of categories over time

↳ variation

the simultaneous presence of alternative usages at any given time

Constructions also have exemplar representations, but these will be more complex, because, depending upon how one defines them, most or all constructions are partially schematic – that is, they have positions that can be filled by a variety of words or phrases.

Problems

1. prototypes

still there (2010: 106–114)

2. grammaticalisation

still treated as the process “by which grammatical items and structures are created” (2010: 106)

3. traditional categories

still used
English auxiliary can, top-level generalisations … (which thus according to Kretzschmar don’t exist)

4. grammaticalisation paths as strange attractors

does not apply the technical term “strange attractor” from complexity science in the way it is usually used for chaotic patterns
(this is something I also noticed, nvda.)

p. 78

Bybee does not yet deal extensively with scaling or with characteristic frequency profiles. She has become the most advanced advocate of complex adaptive systems among usage-based linguists along with Ellis and Larsen-Freeman, and yet her work still has a way to go to escape from the Scholasticism of linguistics that Walker Percy raised, and to become more like the Galileo that Percy said we needed. As with the Five Graces, Bybee continues to carry old baggage from formal linguistics.

Usage-based linguistics and complex systems

What have we learned about usage-based linguistics and complex systems?

1. non-linear distributions

we can expect non-linear distributions of variants for any linguistic feature
speech sounds, words, constructions …

2. linguistic categories are not naturally given and discrete

over time, expressions change in such a way that any boundaries for categories that we may wish to assert at one moment may well be breached a moment later

Exemplar theory suggests a cognitive process by which we can build frequency profiles for speech sounds, although the process of categorization remains an issue there. Still, we need to use categories so that we can count variant expressions and observe the non-linear distributions. The idea of “constructions” is perfect for this purpose, since the use of constructions does not entail a commitment to a fixed hierarchy of categories, and it may be applied to features as small as speech sounds or as large as discourse patterns.

There is no reason that we cannot use traditional terms like “noun” or “verb” or “ditransitive” to name them, as long as we do not invoke an entire hierarchical system when we do so.

grammaticalisation as defined by Hopper (1987)

a process of continual movement that is contingent on time, place, and circumstance and that does not allow grammar ever to be directly observable
does not translate into categories for any state of any language

Neither does the nonlinear distribution constitute evidence that there is a particular cause for the top-ranked variant to be where it is. The complex interaction of recurrence, frequency, and setting for language use rules out any simple cause for the state of a feature or language at any given time, except to say that the process of the complex system of speech always creates nonlinear distributions.

(my emphasis)

p. 79

3. scaling

an unavoidable property of the complex system of speech
(not really sure what he means here)

If we take care to match the assessments we make to the particular populations from which our data comes, we can make better generalizations, whether for a language as a whole, for national or regional varieties, for social groups, or for particular kinds of texts. The complex interaction of recurrence, frequency, and setting for language look different from every point of view at every scale of analysis.

4. speakers as agents

speakers have a lot of choice in what they say and how they say it
no single ‘grammar’ can describe this complexity

When we are aware of the magnitude of the distributional problem, we certainly know that whatever experiment we conduct will never be enough to give us the kind of big-picture answers that formal linguists have generated.

↓

big typological conclusion

there is really only one language, “the human language”
the phenomena that we have perceived as different languages are actually levels of scale within the overall complex system of human language

p. 80

If we stop trying to bring along the old baggage of fixed grammars and selected usages, we are free to choose constructions to count and free to see nonlinear frequency patterns wherever they occur.

↳ constructions

there are infinite constructions, but that is the entire points!
⇒ there is no overarching grammar

Every experiment that we conduct, if it adequately describes the constructions it studies and the population of speakers who use them, makes another contribution to our knowledge of the complex system of language in use.

What we stand to gain more generally by such repeated studies, even though we know in advance that studies at different scales of analysis will not be comparable in their findings, is a new understanding of the operation of complex systems in human culture. The big picture is not a grammar, but instead a new way of understanding the humanities as the emergence of linguistic and cultural patterns out of continual human interaction. Continual movement at every level of scale is crucial to that understanding.

p. 81

Grammar and complex systems

The most important point is that constructions are nothing more or less than patterns of usage, which may therefore become relatively abstract if these patterns include many different kinds of specific linguistic symbols. But never are they empty rules devoid of semantic content of communicative function. In usage-based approaches, countless rules, principles, parameters, constraints, features, and so forth are the formal devices of professional linguists; they simply do not exist in the minds of speakers of a natural language.

(Tomasello 2003 : 100)

(my emphasis, it’s not clear what the last two sentences mean)

p. 82

Kretzschmar POV

we can accommodate more than one theory of grammar in the environment for language study that we have inherited

p. 83

Zipf’s law and the 80/20 rule

p. 84-85

p. 84

Let us say that each of the thirty gradations on the PlanetMath curve represents one hundred different words. Then, as in Figure 4.3, the first five or six gradations at the left of the graph, about 500 or 600 words, account for about 80 percent of all the running words in the text, while the remaining 24 or 25 gradations in the long tail at the right, the other 2,400 or 2,500 different words, account for only about 20 percent of the running words.

p. 84-85

Pareto Principle / 80/20 rule

20 percent of types corresponds to 80 percent of tokens

p. 86

problem with Zipf’s law

implies that regular behavior, in nature and in language, ought to be described as a law

Zipf’s formula has been superseded by, among others, the mathematician Mandelbrot (who spent his career at IBM). Mandelbrot’s improved formula (1968) shows that the top-ranked words on the curve deviate from the frequency that Zipf expected, and the lower-ranked words also deviate, owing to what he called “the wealth of vocabulary.”

In Linguistic Atlas survey data (not written words in continuous discourse but spoken words and phrases gathered in the field), the top-ranked variant is often three, four, five, even ten times more frequent than the second-ranked variant, and we also see curves that are shallower than a 2:1 ratio between the first and second variant (discussed in depth in Chapter 7).

If Zipf’s Law were really a law, in the same way that thermodynamics and gravity are natural laws, then Zipf’s Law just does not work well enough.

↳ frequency is not always and exactly inversely proportional to rank
↳ the shape of the curve is most important (there is room for margin)

English grammar

p. 89

Traditional ⟷ complex-system grammar

traditional gramamr	complex-system grammar
grammar is a static structure of rules	grammar is open and dynamic
grammar consists of a hierarchical arrangement of rules	grammar consists of a very large number of interactive components/agents
grammar exhibits fixed relations between elements	grammar shows emergent order
grammar has binary distributions	grammar has non-linear frequency distributions
grammar has homogeneous unity	grammar has property of scaling

p. 91

neural network theory of activation patterns (Bermudez 2010: 215–245)

explains how information and its frequencies can be processed, without categorization into representations
(no further elaboration)

p. 93

Complexity science and grammar

collocation slot fillers

also follow 80/20 rule
also show A-curve

↳ created by feedback

p. 95

What the 80/20 Rule tells us about grammatical rules, then, is that we know in advance that there will be exceptions. Indeed, exceptions are not rare events, because we can predict that the class of exceptions will account for about 20 percent of the instances of any feature we study. Moreover, the exceptions will account for about 80 percent of the different constructions possible for any feature. Once we understand that the 80/20 Rule is not a curiosity but instead the hallmark of a complex system, we can understand what we take to be grammatical regularities in a different way. Grammatical rules, it turns out, are not laws but more like guidelines

Grammatical rules are, however, more than mere suggestions because they have a nonlinear frequency curve behind them. Indeed, it is highly likely that the 80/20 distributional pattern gave rise to the idea of grammar in the first place, because speakers of any language perceive that for any question about how to put words together, there will be one or a few constructions that occur a great majority of the time, in the 80 percent group. The idea that there is a fixed objective language hierarchy, a linguistic system, originates as an observational artifact, something that we just perceive to be there because we usually do one of just a few things for any construction.

↳ “epiphenomenal” grammar (Hopper 1987)

our perception of grammar arises as a secondary effect of the complex system of speech
- ⚠ not as an intrinsic cause of how languages work

p. 96

Improved grammars

How can our knowledge of speech as a complex system help us to improve the grammars we create?

p. 97-98

1. prescriptive rules have no foundation

rely on someone’s perception of social acceptability
not routed in the A-curve of frequency

↳ grammar handbooks (e.g. Quirk et al.) can present their work as pertaining to the 80% most frequent constructions

Instead of presenting rule systems that could be confused for prescriptions, the grammar would freely admit its leakage and build “exceptions” into the discussion of regularities – as the Comprehensive Grammar already does for some rules.

(this is also already the case in the ANS for Dutch)

p. 98

2. distinguish between language in use and rational linguistic structure

generativism still offers a substantially elaborated model of logical relations in language

The 80/20 Rule suggests that there is no end to the problem of trying to write rules that can generate all of the acceptable sentences of the language. It cannot happen because of the long tail. Once we accept that infrequent constructions are normal parts of language in use, then we can understand that there are too many constructions to accommodate in any elegant rule system. It will, however, continue to be possible to write rule systems that account for the 80 percent group in the 80/20 Rule, and that is in fact what has mostly been happening already.

p. 100-101

3. scale-free networks

we have to pay close attention to exactly what population of speakers or texts our grammar will apply
so: we need to use valid randomised statistical sampling (in order not to only query the ‘grammar’ of a particular group)

Grammars can only be defined for the speech of one population at a time because, while the 80/20 Rule always applies, it will apply differently for every different population of speakers. There is literally an infinite number of possible grammars, because the number of possible groupings of speakers along the geographical/social continuum is infinite.

Longman grammar shows A-curves with different entries for different registers (p. 102)

p. 105

Complex systems and the history of the English language

p. 109

Grammaticalisation

How to accommodate Hopper’s sense of continual movement, and still be able to describe the grammar of the language at any moment in time?

p. 111

process of change

described as frequency change within a complex system

↓ how to deal with it?

observe the non-linear distribution and scaling properties of complex systems
put the 80/20 rule to good practical use
- (though this is immediately dismissed – I’m confused)

On the other hand, in structural grammars that collect paradigmatic lists of possible constructions, there is no good linguistic reason to privilege the most common variants as having been “selected” and therefore have status as being “grammatical” and to relegate less-common variants to “noise” in the system.

p. 112

A-curve frequency profiles

“change” in the complex system of speech for historians

an alteration in frequency of any particular feature
not the selection of one form over another

The increased use of clockwork in the figurative sense has not eliminated the literal sense of clockwork, but the latter has been ousted to a certain degree. Yet it still survives in the low frequency slumber of language.

↓ exteriorisation

p. 113

S-curve

a way to track feature frequency
has already been described for the progress of linguistic change (notably in Kroch 1989; Labov 1994: 65–67)

S-curve ⟷ A-curve

“different expressions of the same basic distributional facts”

The S-curve just describes the successive frequencies of a single variant at different moments in time. In 👁 ↓ we see two different A-curves that correspond to different moments in time for the same variant, and locates the position of the variant on each curve.

(p. 115)

[T]his focus on frequency distributions, rather than qualitative change, also allows explicitly for what Laura Wright (2000: preface) has waggishly called “W curves” that describe increases and decreases in the frequency of the same form over time.

(p. 115)

(the rest of the chapter talks about English specifically and some evolutionary things)

p. 131

Neural networks and complex systems

(fluff)

p. 134

Models in cognitive science

neural networks

can carry information without representations

p. 135

linguistic expectations

is our idea of storage of linguistic units influenced by the Chomskyan tradition?

p. 136

↳ might as well be stored in a neural network

↓ so

speech in the brain

a massively interconnected process

Bybee’s proposal that phonological information is stored on the basis of words (2001) must be understood in neuroscience not as the brain having some single physical location to store a word, not as a representation, but rather as the brain having a collection of interconnected neuronal pathways whose activation is related to a word.

p. 138

Neural network simulations

(weird things going on here)

p. 155

Sociolinguistics, communities, and complex systems

complexity science and language

defines the relationship between language in use and generalisations we make from it
make use of patterns of language as people use it

sociolinguistics

wants to understand patterns of language variation

Classic North American sociolinguistics

p. 156

“coexistent systems”

by William Labov
idea: there are parallel grammars that correspond to different social groups or environments

↳ language change in progress

sociolinguistics shows linguistic “continual readjustment in social context”

p. 157

Focus on systems

The heterogeneous character of the linguistic systems discussed so far is the product of combinations, alternations, or mosaics of distinct, jointly available subsystems. Each of these subsystems is conceived as a coherent, integral body of rules of the categorial, Neogrammarian type: the only additional theoretical apparatus needed is a set of rules stating the conditions for alternation.

(Weinreich, Labov, and Herzog 1968: 165)

p. 158

variability in Labovian sociolinguistics

consists of “alternations” → combinations / substitutions of rules from one system with rules of another
⇒ no ‘real’ addressing of usage-based linguistics

p. 159

If we apply a complex systems view to this analysis, we can see that Labov has managed the subgroups in his data but has not changed what he claims to be studying. He begins with the top level of scale, New York City. He then subtracts the black speakers so that his results reflect only the white speakers, no longer all of New York City. Finally, he subtracts the upper- and lower-class whites so that he ends up with just the working-class white speakers, an even smaller part of the population of New York City. According to the scaling property of complex systems Labov is entitled to examine whichever groups of New York City speakers he wants, but he is then no longer entitled to say that he is still talking about New York City as a whole. According to complex systems it is simply an error to think that the working-class speakers in New York represent the whole city.

p. 161

Style and class

The vernacular is positioned maximally distant from the idealized norm [citations of J. Milroy and S. Poplack]. Once the vernacular baseline is established, the multidimensional nature of speech behavior can be revealed . . . Thus, the unmonitored speech behavior of the vernacular enables us to tap in to the broader dimensions of the speech community. In other words, the vernacular is the foundation from which every other speech behavior can be understood.

(Tagliamonte 2006: 8)

p. 162

↳ does the ‘vernacular’ exist?

any speech is variable at all times
there is no reason for the vernacular to be the ‘most natural’ speech event (Milroy & Gordon 2003)

⇒ the framework is too simple to account for the truly multidimensional interactions we encounter in speech
- no small, fixed set of coexistent systems could account for the multidimensional patterning of language in use from complex systems (p. 163)

p. 163

evolution

sociolinguistics moved beyond simple social groups
- e.g. social networks and communities of practice
step in the right direction

p. 164

‘coexistent systems’ and complex systems

can be seen as the output of complex systems
- “frequency-based patterns of variants for every linguistic feature for every community at every scale”
however: wrong to limit this to a fixed set of coexistent systems

Speech communities and populations of speakers

skipped

p. 167

New varieties

p. 170

An experiment on scale-free communities

p. 172-173

A-curve

appears in all types of groups, regardless of what meaningful tranche is cut in the data

p. 173

Types of A curves
Type A	Type B	Type C

single highly frequent variant	two highly frequent variants (“bump”)	three or more highly frequent variants
several variants with moderate frequencies		quite many variants with moderate frequencies
many variants with low frequencies
long tail

p. 174

Type C curves tend to occur more often when the ratio of speakers per type of response is smaller, as when the number of speakers is small, or the data is subject to more finely graded differentiation of variation (as for the small phonetic differences), or there is an unusually large number of variants (as for cobbler).

We see that it is common, if not typical, for there to be a single variant that is top ranked in all the A-curves, massively more common than any other variant for the same item. However, at the same time, it is typically the case that there are many other variants, mostly the same ones in different subsamples, and they have different relative frequencies and thus different orders on the A-curve.

Our perception of categorical differences between populations of speakers, of common variants unique to particular populations, is simply not supported by the speech production evidence.

p. 177

Type A

most common (80% of times)
has a very clear “top form” → makes us perceive speech as ‘categorical’

other types

less common (20% of times)
do not have a clear “top form”

p. 180

As reported above, we should agree with Horvath and Horvath (2001, 2003) that it is a basic fallacy to think that the behavior of any smaller group will necessarily be the same as the behavior of the larger group of which it forms a part, or that the behavior of any group overall will predict the behavior of its sub-populations.

Measurement of scale-free data

p. 181

Kretzschmar, Kretzschmar & Brockman (2013)

Gini Coefficient and Lorenz Curve to model linguistic data

↳ Lorenz Curve

models the distribution of wealth in a society

The members of a population are represented on the x-axis by rank of the amount of wealth they hold, low to high in order left to right, and the percentage of the total wealth of the population is represented on the y-axis, so that a chart represents a cumulative distribution of wealth in society. In practice there is always a curve that represents the relative inequality of wealth in a population, as opposed to the hypothetical straight line of perfect equality of wealth.

↳ Gini Coefficient

gives proportion between actual Lorenz curve wealth and line of perfect equality

When the Lorenz Curve is charted, then, the deeper the curve, the higher the Gini Coefficient, and the more unequal the wealth.

↓

p. 182

Gini for phonetic data

shows relative inequality of frequencies

p. 183

Issues with Gini Coefficient

1. binning

how many categories do we use to divide up continuous data?

If we consider the problem from the binning side, we can see that a very large number of categories, say one category per token, will not show the A-curve because it yields a linear chart with a slope of zero – a horizontal line. On the other hand, frequency data sorted into just two categories also gives us a line, with a slope that depends on the difference between the two category values. Thus the outer limits for the possible number of categories are both linear, and the nonlinear A-curve can only be observed when the number of categories into which the data is sorted lies between these two extremes.

This problem starts to make sense once you consider that speech can be expressed using formant data (F1, F2, F3) etc., and that you need to bin these values in order to compute Gini.

p. 187-189

2. sample size

you need just a few samples per speaker

p. 192

usefulness of Gini Coefficients

work well to assess the differences in A-curves that arise
working with a small number of bins and a small sample is highly likely to give misleading results

Complex systems for sociolinguists

p. 195

1. exact naming

sociolinguists need to be more exact in naming the populations of speakers for any study

So, for example, the classic sociolinguistic snowball (or “friend of a friend”) method of acquiring speakers in a community is likely to introduce bias since the people are known to each other, prima facie evidence that they are involved in a social network – so that we will bemeasuring speech just in the social network instead of speech in the larger community.

2. take interest in other levels of scale in which speakers participate

it is no longer good enough just to assume that upper-class speakers use the “standard” language
standard usage depends on commitment to educational institutions in which standards are transmitted

p. 196-197

3. use randomised sampling

avoid unintentional bias from groups other than the one under study
groups are not homogeneous! there are always subgroups within one group

Having decided on one neighborhood, or another demographic segment, it is important to use randomized sampling to select participants. All members of the study group will also belong to other groups as well, and researchers need to avoid unintentional bias from those other connections.

sampling recommendations

at least 30 observations
ideally: 250-300 observations

p. 198

4. account for non-linear frequency patterns

make abstractions in this way
do not use binary distinctions

p. 200

5. reanalyse old data with new insights

it’s possible

p. 201

Postmodernism and complex systems

postmodernism

"seeing science as yet another story

p. 202

Fashionable nonsense

p. 203

Another major target of Sokal and Bricmont (1998) is what they call “epistemic relativism,” the idea that “the truth or falsity of a statement is relative to an individual or to a social group” (Sokal and Bricmont 1998: 51), and this idea does come closer to postmodernism. As Sokal and Bricmont point out (Sokal and Bricmont 1998: 52):

There is no doubt that the relativist attitude is at odds with scientists’ idea of their own practice. While scientists try, as best they can, to obtain an objective view of (certain aspects of) the world [with allowance in a footnote for “nuances” of the word objective, as in doctrines like realism, conventionalism, and positivism], relativist thinkers tell them that they are wasting their time and that such an enterprise is, in principle, an illusion. We are thus dealing with a fundamental conflict.

p. 204

Objective truth

But why did I do it? I confess that I’m an unabashed Old Leftist who never quite understood how deconstruction was supposed to help the working class. And I’m a stodgy old scientist who believes, naively, that there exists an external world, that there exist objective truths about that world, and that my job is to discover some of them. (If science were merely a negotiation of social conventions about what is agreed to be “true,” why would I bother devoting a huge fraction of my all-too-short life to it? I don’t aspire to be the Emily Post of quantum field theory.) Sokal and Bricmont 1998: 269)

While there are admitted problems with what might be “objective,” Sokal insists on the objectivity of science as the anchor that saves us from an endless drift of negotiation of the etiquette of truth.

p. 204-205

Postmodernist response

Stanley Aronowitz, one of the editorial team at Social Text, does not put things in quite the same way. In a reply to the Sokal hoax article, he attacks the notion of objectivity (Aronowitz 1997):

So the issue is not whether reality exists, but whether knowledge of it is “transparent.” Herein lies Sokal’s confusion. He believes that reason, logic, and truth are entirely unproblematic. He has an abiding faith that through the rigorous application of scientific method nature will yield its unmediated truth. According to this doctrine there are “objective truths” since the earth revolves around the sun, gravity exists and various other laws of nature are settledmatters. So Sokal never interrogates the nature of evidence or facts, and simply accepts them if they have been adduced within certain algorithms that bear the stamp of “science.”

This statement is too strong, since we have seen that Sokal is willing to admit that non-scientific factors could have an influence, just not the primary influence on changes in scientific models. Still, Aronowitz does hit the mark in saying that, for Sokal, the scientific method of observation and reason are beyond attack. He continues

The point [of studying social connections of science] is not to debunk science or to “deconstruct” it in order to show it is merely a fiction. This may be the postmodern project, but it is not the project of science studies. The point is to show science as a social process, to bring it down to earth, to remove the halo from its head. Scientific truth cannot be absolute; otherwise we might agree with those who have proclaimed the “end” of science. If all knowledge, including natural science, is mediated by the social and cultural context within which it has developed, then its truths are inevitably relational to the means at hand for knowing. In fact, in much of micro-physics what is called observation is often the effects of machine technologies, a reading of effects. But the reading is theory-laden. Which means pure description based on observation is not possible. Scientists require other tools such as machines, mathematics, and infer what they see from what they believe.

This is a very strong position. To claim that the use of technology, machines, must mean that “pure description based on observation is not possible” is surely an overstatement, and to say that scientists “infer what they see from what they believe” is a reactionary conclusion that, setting aside the opening disclaimer of the paragraph, does make science into a fiction. Aronowitz ends with the postmodern project that he denied a few lines before.

p. 207

Postmodernism and speech

(this is really just interesting to me and not really relevant anymore, so I stopped taking notes)

p. 210

So, it is a postmodern view to claim, as many of us now do, that speech is essentially local, and that language variation begins in small groups at the bottom of the scale-free network of speech that rises to broad regional and social continua of speech.

p. 212

It should be clear from this volume that the only way we can understand the complex system of speech is to count tokens and to assemble frequency profiles for the variants of linguistic features.

Computer simulations of language change notes